Practical Simulation of Large-Scale Parallel Programs and Its Performance Analysis of the NAS Parallel Benchmarks
نویسندگان
چکیده
A simulation technique for very large-scale data parallel programs is proposed. In our simulation method, a data parallel program is divided into computation and communication sections. When the control ow of the parallel program does not depend on the contents of network messages, the computation time on each processor is calculated independently. An instrumentation tool called EXCIT is used to calculate the execution time on the target architecture and generate message traces. The communication time is calculated on the message traces by using a network simulator, which is generated by a network simulator generating system INSPIRE. With our tool set, the behavior of parallel programs on thousands processors can be estimated within a practical time span. We demonstrate our method to analyze the class B problems of LU and MG programs of the NAS Parallel Benchmarks with various parameters such as cache size and network bandwidth examined. We found that communication overhead a ects the total execution time considerably, while cache e ect is small.
منابع مشابه
Computing Applications Parallel Application Simulation Parallel Simulation of Large-scale Parallel Applications
Accurate and efficient simulation of large parallel applications can be facilitated with the use of direct execution and parallel discrete-event simulation. This paper describes MPI-SIM, a direct execution-driven parallel simulator designed to predict the performance of existing MPI and MPI-IO application. MPI-SIM can be used to predict the performance of these programs as a function of archite...
متن کاملPerformance Characteristics of Hybrid MPI/OpenMP Implementations of NAS Parallel Benchmarks SP and BT on Large-Scale Multicore Clusters
The NAS Parallel Benchmarks (NPB) are well-known applications with the fixed algorithms for evaluating parallel systems and tools. Multicore clusters provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node and MPI can be used with the communication between nodes. In this paper, we use SP and BT benchma...
متن کاملPhase-Based Parallel Performance Profiling
Parallel scientific applications are designed based on structural, logical, and numerical models of computation and correctness. When studying the performance of these applications, especially on large-scale parallel systems, there is a strong preference among developers to view performance information with respect to their “mental model” of the application, formed from the model semantics used...
متن کاملImplementation and evaluation of HPF/SX V2
We are developing HPF/SX V2, an HPF compiler for vector parallel machines. It provides some unique extensions as well as the features of HPF 2.0 and HPF/JA. This paper describes in particular four of them: 1) the ON directive of HPF 2.0, 2) the REFLECT and LOCAL directives of HPF/JA, 3) vectorization directives, and 4) automatic parallelization. We evaluated these features through some benchmar...
متن کاملPortable High Performance and Scalability of Partitioned Global Address Space Languages
Large scale parallel simulations are fundamental tools for engineers and scientists. Consequently, it is critical to develop both programming models and tools that enhance development time productivity, enable harnessing of massively-parallel systems, and to guide the diagnosis of poorly scaling programs. This thesis addresses this challenge in two ways. First, we show that Co-array Fortran (CA...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998